This system, BayerCLAW, provides a robust framework for orchestrating and executing jobs. It handles the initial launch process from S3, compiles job templates, manages data manipulation and repository interactions, and facilitates notifications for workflow state changes. The runner component orchestrates the execution flow, including string substitutions, workspace management, caching, and quality control checks, ensuring efficient and reliable job processing.
Components
Initializer
This component is responsible for handling the initial S3 launch process. It reads job data from S3, performs substitutions, checks for recursive launches, copies job data to the repository, and writes extended job data back to S3.
Compiler
The Compiler component is responsible for compiling templates, including handling state machine resources, capitalizing top-level keys, and substituting parameters. It also manages state machine versions and aliases.
Common Data Utilities
This component provides shared utility functions for data manipulation, including substituting job data into strings and filenames, and selecting file contents from various formats like JSON, YAML, and CSV. It also includes repository utility functions for managing S3 files.
Notifications
This component is responsible for generating and sending notification messages, specifically for state changes, by creating message attributes and SNS payloads.
Runner Orchestrator
This is the main entry point for the BayerCLAW runner. It orchestrates the entire execution flow, including repository interactions, string substitutions, workspace management, caching, input/output handling, and quality control checks.
Runner Repository Manager
This component manages all repository-related operations for the runner. It handles checking for previous runs, verifying file existence, clearing run status, reading job data, downloading inputs, and uploading outputs to the repository.
Referenced Source Code
Runner String Processor
This component is dedicated to performing various string substitutions required by the runner, including general substitutions and specific image tag substitutions.
Runner Quality Control
This component is responsible for performing quality control checks during the runner's execution. It can also trigger an abortion of the execution if checks fail.
Runner Workspace Manager
This component manages the runner's local workspace. It handles writing job data files and executing commands within the workspace environment.
Runner Cache Manager
This component is responsible for managing the caching of reference inputs for the runner, optimizing data retrieval.